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Abstract 

We consider the NP-hard problem of minimizing a convex quadratic function over 
the integer lattice Z"". We present a simple semidefinite programming (SDP) relaxation 
for obtaining a nontrivial lower bound on the optimal value of the problem. By inter¬ 
preting the solution to the SDP relaxation probabilistically, we obtain a randomized 
algorithm for finding good suboptimal solutions, and thus an upper bound on the op¬ 
timal value. The effectiveness of the method is shown for numerical problem instances 
of various sizes. 


1 Introduction 

We consider the NP-hard problem 

minimize f{x) = x^Px + 2q^x , . 

snbject to x G Z”, 

with variable x, where P G is nonzero, symmetric, and positive semidehnite, and 

q e R". 

A nnmber of other problems can be rednced to the form of Q. The integer least squares 
problem, 

minimize ||Aa; —6||| , . 

snbject to a: e Z”, 

with variable x and data A G and b G R”^, is easily rednced to the form of Q by 

expanding ont the objective fnnction. The mixed-integer version of the problem, where some 
components of x are allowed to be real nnmbers, also rednces to an eqnivalent problem with 
integer variables only. This transformation uses the Schur complement to explicitly minimize 
over the noninteger variables |BVn4l §A.5.5]. Another equivalent formulation of Q is the 
closest vector problem, 

minimize ||n — z\\\ 
subject to 2 ; G {Bx \ x G Z""}, 
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in the variable z G R™. Typically, the columns of B are linearly independent. Although not 
equivalent to Q, the shortest vector problem is also a closely related problem, which in fact, 
is reducible to solving the closest vector problem: 

minimize ||z ||2 
subject to z & {Bx \ x G Z""} 
z^O. 

Problem ([^ arises in several applications. For example, in position estimation using 
the Global Positioning System (GPS), resolving the integer ambiguities of the phase data is 
posed as a mixed-integer least squares problem jHB98] . In multiple-input multiple-output 
(MIMO) wireless communication systems, maximum likelihood detection of (vector) Boolean 
messages involves solving an integer least squares problem |JO05j . The mixed-integer version 
of the least squares problem appears in data htting applications, where some parameters 
are integer-valued. (See, e.g., [URlGj .i The closest vector problem and shortest vector 
problem have numerous application areas in cryptanalysis of public key cryptosystem such 
as RSA |NSnij . The spectral test, which is used to check the quality of linear congruential 
random number generators, is an application of the shortest vector problem |Knu971 §3.3.4]. 

1.1 Previous work 

Several hardness results are known for the integer least squares problem ([^. Given an 
instance of the integer least squares problem, dehne the approximation factor of a point x to 
be \\Ax — 6 II 2 /IIAx* — 6 |||, where x* is the global (integer) solution of ([^. Finding a constant 
factor approximation is an NP-hard problem |ABSS93] . In fact, hnding an approximation 
still remains NP-hard even when the target approximation factor is relaxed to log log 
where c > 0 is some constant |DKRS03] . 

Standard methods for hnding the global optimum of ([^, in the case of positive dehnite 
P, work by enumerating all integer points within a suitably chosen box or ellipsoid |FP851 
IBGL12] . The worst case running time of these methods is exponential in n, making it 
impractical for problems of large size. Algorithms such as Lenstra-Lenstra-Lovasz lattice 
reduction algorithm |LLL82i[SE94j can be used to hnd an approximate solution in polynomial 
time, but the approximation factor guarantee is exponential in n |GLS12| §5.3]. 

A simple lower bound on /*, the optimal value of ([^, can be obtained in 0{rP) time, 
by removing the integer constraint. If g G TZ{P), where 'R-{P) denotes the range of P, then 
this continuous relaxation has a solution x'^*® = —with objective value = —g^Pfg, 
where denotes the Moore-Penrose pseudoinverse of P. (When P is positive dehnite, the 
continuous solution reduces to x'^*® = —P~^q.) If g ^ ^(-P)) then the objective function is 
unbounded below and /* = — 00 . 

There exist diherent approaches for obtaining tighter lower bounds than The 

strongest bounds to date are based on semidehnite programming (SDP) relaxation |BGL121 
IBW13( IBHS15] . The primary drawback of the SDP-based methods is their running time. 
In particular, if these methods are applied to branch-and-bound type enumeration methods 
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to prune the search tree, the benefit of having a stronger lower bound is overshadowed by 
the additional computational cost it incurs, for all small- to medium-sized problems. Enu¬ 
meration methods still take exponential time in the number of variables, whereas solving 
SDPs can be done (practically) in 0{n^) time. Thus, for very large problems, SDP-based 
lower bounds are expected to reduce the total running time of the enumeration methods. 
However, such problems would be too big to have any practical implication. On the other 
hand, there exist weaker bounds that are quicker to compute; in [BHSlSj . for example, these 
bounds are obtained by Ending a quadratic function / that is a global underestimator of /, 
that has the additional property that the integer point minimizing / can be found simply 
by rounding to the nearest integer point. Another approach is given by |BielO] . which is 
to minimize / outside an ellipsoid that can be shown to contain no integer point. Standard 
results on the 5-procedure state that optimizing a quadratic function outside an ellipsoid, 
despite being a nonconvex problem, can be done exactly and efficiently [BEGFB91]. 

A simple upper bound on /* can be obtained by observing some properties of the problem. 
First of all, x = 0 gives a trivial upper bound of /(O) = 0, which immediately gives /* < 0. 
Another simple approximate solution can be obtained by rounding each entry of to the 
nearest integer point, x™*^. Let = /(x™^). Assuming that q G we can get a 

bound on as follows. Start by rewriting the objective function as 

fix) = (x - x‘=*^)^P(x - x"*") + Z"*^ 

Since rounding changes each coordinate by at most 1/2, we have 

n 

llxrnd _ ^cts||2 _ _ ^cts^2 < 

i=l 


It follows that 

^md _ jets ^ (^rnd _ ^.cts^Tp^^rnd _ ^cts) < ^ (n/4)cU^,,„ (3) 

\\v\\2<\/n/2 

where cUmax is the largest eigenvalue of P. Since Z^*® is a lower bound on f*, this inequality 
bounds the suboptimality of x™'^. We note that in the special case of diagonal P, the 
objective function is separable, and thus the rounded solution is optimal. However, in 
general, x™*^ is not optimal, and in fact, Z™*^ can be positive, which is even worse than the 
trivial upper bound Z(0) = 0. 

We are not aware of any efficient method of finding a strong upper bound on f*, other 
than performing a local search or similar heuristics on However, the well-known result 
by [GW95] gives provable lower and upper bounds on the optimal value of the NP-hard max¬ 
imum cut problem, which, after a simple reformulation, can be cast as a Boolean nonconvex 
quadratic problem in the following form: 

maximize x'^Wx 

subject to x} = 1, i = 1,... ,n. ^ 
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These bounds were obtained by solving an SDP relaxation of Q, and subsequently running 
a randomized algorithm using the solution of the relaxation. The expected approximation 
factor of the randomized algorithm is approximately 0.878. There exist many extensions of 
the Goemans-Williamson SDP relaxation |LMS'*~in] . In particular, |BW13] generalizes this 
idea to a more general domain Di x ■ ■ ■ x where each Di is any closed subset of R. 


1.2 Our contribution 


Our aim is to present a simple but powerful method of producing both lower and upper bounds 
on the optimal value /* of Q. Our SDP relaxation is an adaptation of |GW95j . but can 
also be recovered by appropriately using the method in |BW13j . By considering the binary 
expansion of the integer variables as a Boolean variable, we can reformulate ([^ as a Boolean 
problem and directly apply the method of [GW95] . This reformulation, however, increases 
the size of the problem and incurs additional computational cost. To avoid this, we work 
with the formulation ([^, at the expense of slightly looser SDP-based bound. We show that 
our lower bound still consistently outperforms other lower bounds shown in |BGL121 IBHS15] . 
In particular, the new bound is better than the best axis-parallel ellipsoidal bound, which 
also requires solving an SDP. 

Using the solution of the SDP relaxation, we construct a randomized algorithm that 
hnds good feasible points. In addition, we present a simple local search heuristic that can be 
applied to every point generated by the randomized algorithm. Evaluating the objective at 
these points gives an upper bound on the optimal value. This upper bound provides a good 
starting point for enumeration methods, and can save a significant amount of time during 
the search process. We show this by comparing the running time of an enumeration method, 
when different initial upper bounds on the optimal value were given. Also, we empirically 
verify that this upper bound is much stronger than simply rounding a fractional solution 
to the nearest integer point, and in fact, is near-optimal for randomly generated problem 
instances. 


2 Lagrange duality 

In this section, we discuss a Lagrangian relaxation for obtaining a nontrivial lower bound on 
/*. We make three assumptions without loss of generality. Firstly, we assume that q G TZ{P), 
so that the optimal value f* is not unbounded below. Secondly, we assume that ^ Z"', 
otherwise is already the global solution. Lastly, we assume that is in the box [0, !]"■. 
For any arbitrary problem instance, we can translate the coordinates in the following way to 
satisfy this assumption. Note that for any v E Z”, the problem below is equivalent to Q: 

minimize (x — v)'^P{x — u) -I- 2{Pv + qY[x — u) -1- f{v) 
subject to X G Z"". 

By renaming x — u to x and ignoring the constant term /(u), the problem can be rewritten 
in the form of Q. Glearly, this has different solutions and optimal value from the original 
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problem, but the two problems are related by a simple change of coordinates: point x in the 
new problem corresponds to a; + n in the original problem. To translate the coordinates, hnd 
^cts _ —p\q^ and take elementwise floor to to get Then, substitute x^'^ in place of 
V above. 

We note a simple fact that every integer point x satishes either Xj < 0 or Xj > 1 for all 
i. Equivalently, this condition can be written as Xj(xj — 1) > 0 for all i. Using this, we relax 
the integer constraint x G Z"' into a set of nonconvex quadratic constraints: Xj(xj — 1) > 0 
for all i. The following nonconvex problem is then a relaxation of ([^: 

minimize x^Px + 2q^x . . 

subject to Xi{xi — 1 ) > 0, i = 1 ,..., n. ^ 

It is easy to see that the optimal value of (js]) is greater than or equal to because x'^*® is 
not a feasible point, due to the two assumptions that x'^*® ^ Z" and x'^*® G [0,1]*^. Note that 
the second assumption was necessary, for otherwise x'^*® is the global optimum of (|^, and 
the Lagrangian relaxation described below would not produce a lower bound that is better 
than Z*^*®. 

The Lagrangian of ([^ is given by 

n 

L{x, A) = x^Px + 2q^x — ^ AiXj(xj — 1) = x'^{P — diag(A))x + 2{q + (l/2)A)'^x, 

i=\ 


where A G R” is the vector of dual variables. Dehne g(A) = q + (1/2)A. By minimizing the 
Lagrangian over x, we get the Lagrangian dual function 




—g(A)^ (P — diag(A))^ g(A) if P — diag(A) P 0 and g(A) G P(P — diag(A)) 
—oo otherwise. 


( 6 ) 

where the inequality P is with respect to the positive semidehnite cone. The Lagrangian 
dual problem is then 


maximize g{\) 
subject to A > 0, 


(7) 


in the variable A G R"^, or equivalently. 


maximize —q{\)'^ (P — diag(A))^ g(A) 
subject to P — diag(A) P 0 

q{X) G P(P - diag(A)) 

A > 0. 


By using the Schur complements, the problem can be reformulated into an SDP: 


maximize 
subject to 


-7 

P — diag(A) 
(g+(i/2)Ar 
A > 0, 


q +(1/2)A 

7 


P 0 


( 8 ) 


in the variables A G R"' and 7 G R. We note that while ([^ is derived from a nonconvex 
problem (|^, it is convex and thus can be solved in polynomial time. 
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2.1 Comparison to simple lower bound 

Due to weak duality, we have g{X) < /* for any A > 0, where g{X) is defined by (§. Using 
this property, we show a provable bound on the Lagrangian lower bound. Let 
be the simple lower bound on /*, and = sup;^>o( 7 (A) be the lower bound obtained by 
solving the Lagrangian dual. Also, let cui > ■ ■ ■ > be the eigenvalues of P. For clarity of 
notation, we use cUmax and cUmin to denote the largest and smallest eigenvalues of P, namely 
uji and (jJn- Let 1 represent a vector of an appropriate length with all components equal to 
one. Then, we have the following result. 


Theorem 1. The lower bounds satisfy 


jtsdp _ ^ 


cts \ ^^min 






-(1/2)1||2 

n/4 


2 


(9) 


Proof. When cU min = 0, the righthand side of (|^ is zero, and there is nothing else to show. 
Thus, without loss of generality, we assume that ojrain > 0, he., P >- 0. 

Let P = Qdiag(a;)Q^ be the eigenvalue decomposition of P, where u = (cui,... ,a;„). 
We consider A of the form A = al, and rewrite the dual function in terms of a, where a is 
restricted to the range a G [0, (U min ): 

g[a) = -{q + (l/2)al)^ (P - {q + (l/2)al). 


We note that ^'(O) = —q"’"P~^q = so it is enough to show the same lower bound on 
g{a) — g{0) for any particular value of a. 

Let s = Q^l, and q = Q^q. By expanding out g{a) in terms of s, q, and u, we get 


g{a)-g(0) = _yi+^£ii±S}P}^ ^ yJL 

^ oji — a ^ 


i=l 

n 

E 

2=1 


2 = 1 


Ui 


[a/oji){qi + {l/2)ojiSif - {l/A)as‘j{oJi - a) 
oji — a 


a 2 _ + (l/2)c(;jSj)^ 

T 2^^^ ~ 2^ 


i=l i=l 

an 


a 


EL 


2=1 


ujiiuji - a) 

« \ (^i + (l/2)caiSi 


Ui 


Ui — a 


By differentiating the expression above with respect to a, we get 


n 


»'(“) = i - E 

2=1 


Qi + (l/2)ci;jSj 


Ui — a 
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We note that g' is a decreasing function in a in the interval [0, ca min )- Also, at a = 0, we 
have 



n 

4 

n 

4 

n 

4 

n 

4 

> 0. 


'^^{qi/ui + {l/2)siY 
i=\ 

||diag(w)“‘(J’’g + (l/ 2 )Q’’l ||2 
||-0diag(w)“'<J’’g - {ll2)QQ'^l\\^^ 

lk“-(l/2)l|t 


The last line used the fact that is in the box [0,1]”. 

Now, we distinguish two cases depending on whether the equation g'{a) = 0 has a solution 
in the interval [0, cU min )- 


1. Suppose that g\oi*) = 0 for some a* G [0, ca min ). Then, we have 


9(a*)-9(0) = 


2=1 


f(li + {l/2)uJiSi 


Ui 


* I n 
a - 


-E 

2 = 1 


Qi + (l/2)ci;jSj 


Ui — a” 


Ui — 
2^ 


+ E 


{qi + {l/2)uiSi 


i=l 


Ui \ Ui- 


f qi + {l/2)uiSi'' 


> 


i=l 


Ui 


Ui — 


UJr\ 


E 

2=1 


Qi + (l/2)ci;jSi 


Ui — a* 


na 


*2 




Using this, we go back to the equation g'{a*) = 0 and establish a lower bound on a* 


n 

4 


E 

2=1 

n 

E 

2 = 1 


Qi + (l/2)ci;iSj 

Ui — a* 


Ui 


Ui — 


-{Qi/^i + (l/2)Sj)" 


< WteM + (l/2)a.)= 


i=l 


U. 


mm 11 cts 




xCts _ 
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From this inequality, we have 


a* > u„ 


1 - 


^cts _ (1/2)11 

n/4 


Plugging in this lower bound on a* gives 


/““■ - s(0) > S(a*) - 9(0) > 


nut 




l^cts _ (i/2)l||( 

n/4 


2. If g'{a) 7 ^ 0 for all a G [0,a;min), then by continuity of g', it must be the case that 
g\a) > 0 on the range [0, cU min h Then, for all a G [0, 


r*-9(0) > 9 (a)-g(0) 


> 


an 

T 

an 

~T 


a 


a 


E 

i=l 
1 ■ 


an 

~4 


> — -a 1 


1 - — 
Ui 


a 

^max 

a 


Qi + (l/2)ci;jSj 

Ui - a 


u 


max 


E 

i=l 

n 

4 


Qi + (l/2)cJiSj 


Ui — a 


and thus. 


na 

define 


- ^(0) > lim 


na 


nut 


^ -'-mm dWjnax 4Ct;jnax 

Therefore, in both cases, the increase in the lower bound is guaranteed to be at least 

,2 / - ( 1 / 2 ) 1 | 


nu; 

Au„ 


1 - 


| 2 \ 2 
l2 


n/4 


as claimed. 


□ 


Now we discuss several implications of Theorem First, we note that the righthand 
side of (|^ is always nonnegative, and is monotonically decreasing in — (1/2)1||2. In 
particular, when is an integer point, then we must have = /®'^p = /*• Indeed, for 

^cts ^ |g^ ]^jn^ have — (1/2)1||2 = n/4, and the righthand side of is zero. Also, 
when P is positive dehnite, he., U m\n > 0, then ([^ implies that /®'^p > 

In order to obtain the bound on /®‘^p, we only considered vectors A of the form al. 
Interestingly, solving 0 with this additional restriction is equivalent to solving the following 
problem: 

minimize x^Px + 2q^x 
subject to ||x — (1/2)1||2 > n/4. 


( 10 ) 














The nonconvex constraint enforces that x lies outside the n-dimensional sphere centered at 
(1/2)1 that has every lattice point {0,1}"' on its boundary. Even if (10) is not a convex 


problem, it can be solved exactly; the 5-lemma implies that the SDP relaxation of (10) 
is tight (see, e.g., |BEGFB94] ). For completeness, we give the dual of the SDP relaxation 


(which has the same optimal value as (10)) below, which is exactly what we used to prove 
Theorem [T1 

maximize —7 


subject to 


P — al q + (a/2)l 
{q + {a/2)lf 7 


^ 0 




3 Semidefinite relaxation 


In this section, we show another convex relaxation of (O that is equivalent to 
a different form. By introducing a new variable X = we can reformulate 


(8), but with 
(5) as: 


minimize Tr(PX) -|- 2q^x 
subject to diag(X) > x 
X = xx'^, 


in the variables X G and x G R". Observe that the constraint diag(X) > x, along 

with X = xx'^, is a rewriting of the constraint Xj(xj — 1) > 0 in (|^. 

Then, we relax the nonconvex constraint X = xx'^ into X P xx^, and rewrite it using 
the Schur complement to obtain a convex relaxation: 


minimize 
subject to 


Tr{PX) + 2g^x 
diag(X) > X 


( 11 ) 


The optimal value of problem (11) is a lower bound on /*, 
ation (|^ gives a lower bound /®^on /*. In fact, problems 
other, and they yield the same lower bound [VB96] . 


just as the Lagrangian relax- 
and 0 are duals of each 


3.1 Randomized algorithm 


The semidehnite relaxation (11) has a natural probabilistic interpretation, which can be 
used to construct a simple randomized algorithm for obtaining good suboptimal solutions, 
he., feasible points with low objective value. Let (X*,x*) be any solution to Suppose 
2 G R" is a Gaussian random variable with mean qt and covariance matrix S. Then, /i = x* 
and S = X* — x*x*^ solve the following problem of minimizing the expected value of a 
quadratic form, subject to quadratic inequalities: 


minimize E{z'^Pz -|- 2q'^z) 

subject to E{zi{zi — 1)) > 0, i = 1,... ,n, 
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in variables fi G R” and S G Intuitively, this distribution S) has mean close to 

so that the expected objective value is low, but each diagonal entry of S is large enough 
so that when 2; is sampled from the distribution, Zi{zi — 1) > 0 holds in expectation. While 
sampling 2; from S) does not give a feasible point to Q immediately, we can simply 
round it to the nearest integer point to get a feasible point. Using these observations, we 
present the following randomized algorithm. 


Algorithm 3.1 Randomized algorithm for suboptimal solution to 0. 


given number of iterations K. 


1. Solve SDP. Solve (11) to get X* and x*. 

2. Form eovariance matrix. S := X* — x*x*'^, and find Cholesky factorization = S. 

3. Initialize best point, := 0 and /t>est ._ q_ 
for k = 1,2,..., K 

4. Random sampling, z^^'l := x* + Lw, where w ~ Af(0, 1). (Same as z^^'> ~ J\f{x*, S). 

5. Round to nearest integer, x^^l := round( 2 ;^^^). 

6. Update best point. If then set x'’®®* := x^^'l and Z'’®®* ;= f{x^^^) . 


The SDP in step 1 takes 0{n^) time to solve, assuming that the number of iterations 
required by an interior point method is constant. Step 2 is dominated by the computation 
of Cholesky factorization, which uses roughly n^/3 flops. Steps 4 through 6 can be done in 
O(n^) time. The overall time complexity of the method is then 0{n^{K + n)). By choosing 
K = 0{n), the time complexity can be made 0{'n?). 


4 Greedy algorithm for obtaining a 1-opt solution 


Here we discuss a simple greedy descent algorithm that starts from an integer point, and 
iteratively moves to another integer point that has a lower objective value. This meth od 
can be applied to the simple suboptimal point or every found in Algorithm 3.1, to 
yield better suboptimal points. 

We say that x G Z” is 1-opt if the objective value at x does not improve by changing a 
single coordinate, he., f{x + cci) > f{x) for all indices i and integers c. The difference in 
the function values at x and x + ccj can be written as 


f{x + cci) - f{x) = c^Pii + CQi = Pii{c + Qi!{2Pii)f - g\l{APif), 


where g = 2{Px + q) is the gradient of / at x. It is easily seen that given i, the expression 
above is minimized when c = round(—5fj/(2Pjj)). For x to be optimal with respect to Xj, 
then c must be 0, which is the case if and only if Pa > \gi\. Thus, x is 1-opt if and only if 
diag(P) > l^fl, where the absolute value on the righthand side is taken elementwise. 

Also, observe that 

P(x -k cCi) -Fq = (Px + q) + cPi, 
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where Pi is the ith column of P. Thus, when x changes by a single coordinate, the value 
of g can be updated just by referencing a single column of P. These observations suggest a 
simple and quick greedy algorithm for hnding a 1-opt point from any given integer point x. 


Algorithm 4.1 Greedy descent algorithm for obtaining 1-opt point. 

given an initial point x G Z”. 

1. Compute initial gradient, g = 2{Px + q). 

repeat 

2. Stopping criterion, quit if diag(P) >H- 

3. Find descent direction. Find index i and integer c minimizing pPu + cgi. 

4. Update x. Xi := Xi + c. 

5. Update gradient, g := g + 2cPi. 


Initializing g takes O(u^) flops, but each subsequent iteration only takes 0{n) flops. This 
is because steps 2 and 3 only refer to the diagonal elements of P, and step 5 only uses a single 
column of P. Though we do not give an upper bound on the total number of iterations, we 
show, using numerical examples, that the average number of iterations until convergence is 
roughly 0.14n, when the initial points are sampled according to the probability distribution 
given in 13.1 The overall time complexity of Algorithm |4.l| is the n O( n^) on average. Thus, 
we can run the greedy 1-opt descent on every x^^'^ in Algorithm 3.1, without changing its 
overall time complexity 0{'n?{K -l- n)). 


5 Examples 

In this section, we consider numerical examples to show the performance of the SDP-based 
lower bound and randomized algorithm, developed in previous sections. 


5.1 Method 


We combine the techniques developed in previous sections to hnd lower and upper bounds 
on /*, as well as suboptimal solutions to the problem. By solving the simple relaxation 
and rounding the solution, we immedi atel y get a lower bound and an upper bound 

yrnd ^ 


on x™^ to get a 1-opt point, namely x 


rnd 


This 


We also run Algorithm 4.1 
gives another upper bound = /(x™'^). 

Then, we solve the semidehnite relaxation 0 to get a lower bound Using the 

solution to the SDP, we run Algorithm |3.1 to obtain suboptimal solutions, and keep the 
best suboptimal solution In addition, we run Algorithm 4T on every feasible point 

considered in step 4 of Algorithm |3.l[ and hnd the best 1-opt suboptimal solution The 
randomized algorithm thus yields two additional upper bounds on /*, namely 
and Z'’®®* = f{x^^^^). 
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The total number of iterations K in Algorithm 3.1 is set to K = 3n, so that the overall 
time complexity of the algorithm, not counting the running time of the 1-opt greedy descent 


algorithm, is 0{n^). We note that the process of sampling points and running Algorithm 4.1 
trivially parallelizes. 


5.2 Numerical examples 

We use random instances of the integer least squares problem (|^ generated in the following 
way. First, the entries of A G are sampled independently from 7V(0,1). The dimen¬ 
sions are set as m = 2n. We set q = where P = A^A, and is randomly drawn 

from the box [0,1]”. The problem is then scaled so that the simple lower bound is —1, he., 

yets ^ _qTptq = 

There are other ways to generate random problem instances. For example, the eigen- 
spectrum of P is controlled by the magnitude of m relative to n. We note that P becomes 
a near-diagonal matrix as m diverges to inhnity, because the columns of A are uncorrelated. 
This makes the integer least squares problem easier to solve. On the contrary, smaller m 
makes the problem harder to solve. Another way of generating random problem instances is 
to construct P from a predetermined eigenspectrum ui ,..., ojn, as P = Q diag(c(;)Q^, where 
Q is a random rotation matrix. This makes it easy to generate a matrix with a desired 
condition number. Our method showed the same qualitative behavior on data generated in 
these different ways, for larger or smaller m, and also for different eigenspectra. 

The SDP (|g was solved using CVX [HBH IUBOH] with the MOSEK 7.1 solver 
on a 3.40 GHz Intel Xeon machine. For problems of relatively small size n < 70, we found 
the optimal point using MILES |CZ07] . a branch-and-bound algorithm for mixed-integer 
least squares problems, implemented in MATLAB. MILES solves ([^ by enumerating lattice 
points in a suitably chosen ellipsoid. The enumeration method is based on various algorithms 
developed in |GYZ05l IAEVZ021 EEM [FP85] . 

5.3 Results 

Lower bounds. We compare various lower bounds on /*. In |BHS15] . three lower bounds 
on /* are shown, which we denote by and respectively. These bounds are 

constructed from underestimators of / that have a strong rounding property, he., the integer 
point minimizing the function is obtained by rounding the continuous solution. We denote 
the lower bound obtained by solving the following trust region problem in |Biein] by 

minimize x^Px -|- 2g^x 

subject to ||x — x^*®||2 > — x™'^||2. 

To the best of our knowledge, there is no standard benchmark test set for the integer 
least squares problem. Thus, we compared the lower bounds on randomly generated problem 
instances; for each problem size, we generated 100 random problem instances. Note that in 
all instances, the simple lower bound was = — 1. We found not only that our method 
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n 

r 

ysdp 

JSLXp 

y'qax 

y'qrd 

r 

50 

-0.8357 

-0.9162 

-0.9434 

-0.9434 

-0.9736 

-0.9736 

60 

-0.8421 

-0.9202 

-0.9459 

-0.9459 

-0.9740 

-0.9740 

70 

-0.8415 

-0.9212 

-0.9471 

-0.9471 

-0.9747 

-0.9747 

100 

N/A 

-0.9268 

-0.9509 

-0.9509 

-0.9755 

-0.9755 

500 

N/A 

-0.9401 

-0.9606 

-0.9606 

-0.9777 

-0.9777 

1000 

N/A 

-0.9435 

-0.9630 

-0.9630 

-0.9781 

-0.9781 


Table 1: Average lower bound by number of variables. 


found a tighter lower bound on average, but also that our method performed consistently 
better, he., in all problem instances, the SDP based lower bound was higher than any other 
lower bound. We found that the pairs of lower bounds and were 

practically equal, although the results of |BHS15j show that they can be all different. We 
conjecture that this disparity comes from different problem sizes and eigenspectra of the 
random problem instances. The solution /* was not computed for n > 70 as MILES was 
unable to hnd it within a reasonable amount of time. 

To see how tight compared to the simple lower bound is, we focus on the set of 100 
random problem instances of size n = 60, and show, in Figure [T| the distribution of the gap 
between /* and the lower bounds. 


Upper bounds. Algorithm 3T gives a better suboptimal solution as the number of sam¬ 
ples K grows. To test the relationship between the number of samples and the quality of 
suboptimal solutions, we considered a specihc problem instance of size n = 500 and sampled 
K = 50n points. The result suggested that 77 = 3n is a large enough number of samples for 
most problems; in order to decrease further, many more samples were necessary. All 
subsequent experiments discussed below used K = 3n as the number of sample points. 

In Table we compare different upper bounds on f* using the same set of test data 
considered above. We found that Algorithm 3.1 combined with the 1-opt heuristic gives 
a feasible point whose objective value is, on average, within 5 x 10“^ from the optimal 
value. The last column of Table indicates the percentage of the problem instances for 
which Z^iest _ j* held; we not only found near-optimal solutions, but for most problems, the 
randomized algorithm actually terminated with the global solution. We expect the same for 
larger problems, but have no evidence since there is no efficient way to verify optimality. 

We take the same test data used to produce Figure and show histograms of the subop¬ 
timality of 7™'^, a:’’®®*, and The mean suboptimality of was 0.1025, and simply 
Ending a 1-opt point from improved the mean suboptimality to 0.0200. Algorithm 3.1 


itself, without 1-opt rehnement, produced suboptimal points of mean suboptimality 0.0153, 
and running Algorithm 4T on top of it reduced the suboptimality to 0.0002. 

Finally, we take problems of size n = 1000, where all existing global methods run too 
slowly. As the optimal value is unobtainable, we consider the gap given by the difference 
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Figure 1: Histograms of the gap between the optimal value /* and the two lower bounds 
and for 100 random problem instances of size n = 60. 


n 

r 

ybest 

ybest 

yrnd 

yrnd 

Optimal 

50 

-0.8357 

-0.8353 

-0.8240 

-0.8186 

-0.7365 

90% 

60 

-0.8421 

-0.8420 

-0.8268 

-0.8221 

-0.7397 

94% 

70 

-0.8415 

-0.8412 

-0.8240 

-0.8235 

-0.7408 

89% 

100 

N/A 

-0.8465 

-0.8235 

-0.8296 

-0.7488 

N/A 

500 

N/A 

-0.8456 

-0.7991 

-0.8341 

-0.7466 

N/A 

1000 

N/A 

-0.8445 

-0.7924 

-0.8379 

-0.7510 

N/A 


Table 2: Average upper bound by number of variables. 
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Figure 2: Histograms of the suboptimality of /™'^, and for 100 random 

problem instances of size n = 60. 


between the upper and lower bounds. Figure |^shows histograms of the four optimality gaps 
obtained from our method, namely and Z'’*^®* — Z®'^^. The 

mean value of these quantities were: 0.2490, 0.1621, 0.1511, and 0.0989. As seen in Table 
we believe that the best upper bound Z*^®®* is very close to the optimal value, whereas the 
lower bound is farther away from the optimal value. 

Running time. In Table we compare the running time of our method and that of 
MILES for problems of various sizes. The running time of MILES varied depending on 
particular problem instances. For example, for n = 70, the running time varied from 6.6 
seconds to 25 minutes. Our method showed more consistent running time, and terminated 
within 3 minutes for every problem instance of the biggest size n = 1000. It should be noted 
that MILES always terminates with a global optimum, whereas our method does not have 
such a guarantee, even though the experimental results suggest that the best suboptimal 
point found is close to optimal. From the breakdown of the running time of our method, 
we see that none of the three parts of our method clearly dominates the total running time. 
We also note that the total running time grows subcubically, despite the theoretical running 
time of 0{'n?). 

In a practical setting, if a near-optimal solution is good enough, then running Algo¬ 
rithm is more effective than enumeration algorithms. Algorithm |3.1| is particularly useful 
if the problem size is 60 or more; this is the range where branch-and-bound type algorithms 
become unviable due to their exponential running time. In practice, one may run an enu¬ 
meration algorithm (such as MILES) and terminate after a certain amount of time and use 
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Figure 3: Histograms of the four optimality gaps for 100 random problem instances of 
large size n = 1000. 


n 

Total running time 

Breakdown of running time 

MILES 

Our method 

SDP 

Random Sampling 

Greedy 1-opt 

50 

3.069 

0.397 

0.296 

0.065 

0.036 

60 

28.71 

0.336 

0.201 

0.084 

0.051 

70 

378.2 

0.402 

0.249 

0.094 

0.058 

100 

N/A 

0.690 

0.380 

0.193 

0.117 

500 

N/A 

20.99 

12.24 

4.709 

4.045 

1000 

N/A 

135.1 

82.38 

28.64 

24.07 


Table 3: Average running time of MILES and our method in seconds (left), and breakdown 
of the running time of our method (right). 
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n 

ymiles _ j’hest 

j’hest ^ ymiles 

50 

0.0117 

98% 

60 

0.0179 

99% 

70 

0.0230 

99% 

100 

0.0251 

100 % 

500 

0.0330 

100 % 

1000 

0.0307 

100 % 


Table 4: Comparison of the best suboptimal point found by our method and by MILES, 
when alloted the same running time. 


n 

Iterations 

50 

6.06 

60 

7.35 

70 

8.59 

100 

11.93 

500 

65.89 

1000 

141.1 


Table 5: Average number of iterations of Algorithm 


4.1 


the best found point as an approximation. To show that Algorithm is a better approach 
of obtaining a good suboptimal solution, we compar e ou r method against MILES in the 
following way. First, we compute via Algorithm 


3.1 


and record the running time T. 
Then, we run MILES on the same problem instance, but with time limit T, he., we terminate 
MILES when its running time exceeds T, and record the best suboptimal point found. Let 
ymiies objective value of this suboptimal point. Table shows the average value of 

ymiies _ jbest percentage of problem instances for which held. The 

experiment was performed on 100 random problem instances of each size. We observe that 
on every problem instance of size n > 100, our method produced a better point than MILES, 
when alloted the same running time. 

There is no simple bound for the number of iterations that Algorithm 4T takes, as it 
depends on both P and g, as well as the initial point. In Table we give the average 
number of iterations for Algorithm 4.1 to terminate at a 1-opt point, when the initial points 
are sampled from the distribution found in §3.1[ We see that the number of iterations when 


n < 1000 is roughly 0.14n. Although the asymptotic growth in the number of iterations 
appears to be slightly superlinear, as far as practical applications are concerned, the number 
of iterations is effectively linear. This is because the SDP ( [IT| ) cannot be solved when the 
problem becomes much larger (e.g., n > 10®). 
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n 

Initial upper bound 

0 

yrnd 

jrnd 

jhest 

f* 

50 

3.046 

3.039 

2.941 

2.900 

2.537 

60 

29.02 

29.09 

28.14 

24.07 

23.56 

70 

379.6 

379.4 

361.1 

290.0 

280.6 


Table 6: Average running time of MILES, given different initial upper bounds. 


Branch-and-bound method. The results of Table suggest that enumeration methods 
that solve Q globally can utilize Algorithm 3.1 by taking the suboptimal solution and 
using its objective value as the initial bound on the optimal value. In Table we show 
the average running time of MILES versus n, when different initial bounds were provided to 
the algorithm: 0, and /*. For a fair comparison, we included the running 

time of computing the respective upper bounds in the total execution time, except in the 
case of f*. We found that when n = 70, if we start with as the initial upper bound 
of MILES, the total running time until the global solution is found is roughly 24% lower 
than running MILES with the trivial upper bound of 0. Even when /* is provided as the 
initial bound, branch-and-bound methods will still traverse a search tree to look for a better 
point (though it will eventually fail to do so). This running time, thus, can be thought as 
the baseline performance. If we compare the running time of the methods with respect to 
this baseline running time, the effect of starting with a tight upper bound becomes more 
apparent. When 0 is given as the initial upper bound, MILES spends roughly 3 more minutes 
traversing the nodes that would have been pruned if it started with /* as the initial upper 
bound. However, if the initial bound is then the additional time spent is only 10 

seconds, as opposed to 3 minutes. Finally, we note that the baseline running time accounts 
for more than 70% of the total running time even when the trivial upper bound of zero is 
given; this suggests that the core difficulty in the problem lies more in proving the optimality 
than in Ending an optimal point itself. In our experiments, the simple lower bound was 
used to prune the search tree. If tighter lower bounds are used instead, this baseline running 
time changes, depending on how easy it is to evaluate the lower bound, and how tight the 
lower bound is. 

Directly using the SDP-based lower bound to improve the performance of branch-and- 
bound methods is more challenging, due to the overhead of solving the SDP. The results 
by |BHS15j suggest that in order for a branch-and-bound scheme to achieve faster running 
time, quickly evaluating a lower bound is more important than the quality of the lower 
bound itself. Even if our SDP-based lower bound is superior to any other lower bound 
shown in related works, solving an SDP at every node in the branch-and-bound search tree is 
computationally too costly. Indeed, the SDP-based axis-parallel ellipsoidal bound in |BHS15] 
fails to improve the overall running time of a branch-and-bound algorithm when applied to 
every node in the search tree. An outstanding open problem is to find an alternative to 
that is quicker to compute. One possible approach would be to look for (easy-to-find) 
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feasible points to Q; as noted in ^ it is not necessary to solve ([^ optimally in order to 
compute a lower bound, since any feasible point of it yields a lower bound on /*. (See, 
e.g., jDonlbj .) 
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